Human Parsing


Human parsing is the process of identifying, segmenting, and categorizing different parts of a human body in an image or video such as head, shoulders, knees, and toes.

LaDy: Lagrangian-Dynamic Informed Network for Skeleton-based Action Segmentation via Spatial-Temporal Modulation

Add code
Mar 25, 2026
Viaarxiv icon

PosterIQ: A Design Perspective Benchmark for Poster Understanding and Generation

Add code
Mar 25, 2026
Viaarxiv icon

MMTIT-Bench: A Multilingual and Multi-Scenario Benchmark with Cognition-Perception-Reasoning Guided Text-Image Machine Translation

Add code
Mar 25, 2026
Viaarxiv icon

CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

Add code
Mar 25, 2026
Viaarxiv icon

Learning Trajectory-Aware Multimodal Large Language Models for Video Reasoning Segmentation

Add code
Mar 23, 2026
Viaarxiv icon

ALARA for Agents: Least-Privilege Context Engineering Through Portable Composable Multi-Agent Teams

Add code
Mar 20, 2026
Viaarxiv icon

A Mathematical Theory of Understanding

Add code
Mar 19, 2026
Viaarxiv icon

UniGround: Universal 3D Visual Grounding via Training-Free Scene Parsing

Add code
Mar 09, 2026
Viaarxiv icon

Semantic Risk Scoring of Aggregated Metrics: An AI-Driven Approach for Healthcare Data Governance

Add code
Mar 09, 2026
Viaarxiv icon

What Exactly do Children Receive in Language Acquisition? A Case Study on CHILDES with Automated Detection of Filler-Gap Dependencies

Add code
Mar 02, 2026
Viaarxiv icon